Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions
نویسندگان
چکیده
Image segmentation from referring expressions is a joint vision and language modeling task, where the input is an image and a textual expression describing a particular region in the image; and the goal is to localize and segment the specific image region based on the given expression. One major difficulty to train such language-based image segmentation systems is the lack of datasets with joint vision and text annotations. Although existing vision datasets such as MS COCO provide image captions, there are few datasets with region-level textual annotations for images, and these are often smaller in scale. In this paper, we explore how existing large scale vision-only and text-only datasets can be utilized to train models for image segmentation from referring expressions. We propose a method to address this problem, and show in experiments that our method can help this joint vision and language modeling task with vision-only and text-only data and outperforms previous results.
منابع مشابه
Topic-Based Structuring of a Very Large-Scale News Video Corpus
We introduce a topic-based inter-video structuring method that considers application to a very large-scale news video corpus as well as user interfaces that provide the users with the ability to efficiently browse through the corpus based on the topic structure. Although the proposed method is a multimedia-integrated method that refers to both text and image based information, this paper focuse...
متن کاملEnhancement of Learning Based Image Matting Method with Different Background/Foreground Weights
The problem of accurate foreground estimation in images is called Image Matting. In image matting methods, a map is used as learning data, which is produced by those pixels that are definitely foreground, definitely background ,and unknown. This three-level pixel map is often referred to as a trimap, which is produced manually in alpha matte datasets. The true class of unknown pixels will be es...
متن کاملThe ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for Traffic Vision Research
Video image datasets are playing an essential role in design and evaluation of traffic vision algorithms. Nevertheless, a longstanding inconvenience concerning image datasets is that manually collecting and annotating large-scale diversified datasets from real scenes is time-consuming and prone to error. For that virtual datasets have begun to function as a proxy of real datasets. In this paper...
متن کاملSample-oriented Domain Adaptation for Image Classification
Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...
متن کاملDeep Neural Networks for Semantic Segmentation of Multispectral Remote Sensing Imagery
A semantic segmentation algorithm must assign a label to every pixel in an image. Recently, semantic segmentation of RGB imagery has advanced significantly due to deep learning. Because creating datasets for semantic segmentation is laborious, these datasets tend to be significantly smaller than object recognition datasets. This makes it difficult to directly train a deep neural network for sem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1608.08305 شماره
صفحات -
تاریخ انتشار 2016